Regret Transfer and Parameter Optimization

نویسندگان

  • Noam Brown
  • Tuomas Sandholm
چکیده

Regret matching is a widely-used algorithm for learning how to act. We begin by proving that regrets on actions in one setting (game) can be transferred to warm start the regrets for solving a different setting with same structure but different payoffs that can be written as a function of parameters. We prove how this can be done by carefully discounting the prior regrets. This provides, to our knowledge, the first principled warm-starting method for no-regret learning. It also extends to warm-starting the widely-adopted counterfactual regret minimization (CFR) algorithm for large incompleteinformation games; we show this experimentally as well. We then study optimizing a parameter vector for a player in a two-player zero-sum game (e.g., optimizing bet sizes to use in poker). We propose a custom gradient descent algorithm that provably finds a locally optimal parameter vector while leveraging our warm-start theory to significantly save regret-matching iterations at each step. It optimizes the parameter vector while simultaneously finding an equilibrium. We present experiments in no-limit Leduc Hold’em and nolimit Texas Hold’em to optimize bet sizing. This amounts to the first action abstraction algorithm (algorithm for selecting a small number of discrete actions to use from a continuum of actions—a key preprocessing step for solving large games using current equilibrium-finding algorithms) with convergence guarantees for extensive-form games.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robustness in portfolio optimization based on minimax regret approach

Portfolio optimization is one of the most important issues for effective and economic investment. There is plenty of research in the literature addressing this issue. Most of these pieces of research attempt to make the Markowitz’s primary portfolio selection model more realistic or seek to solve the model for obtaining fairly optimum portfolios. An efficient frontier in the ...

متن کامل

Black-Box Reductions for Parameter-free Online Learning in Banach Spaces

We introduce several new black-box reductions that significantly improve the design of adaptive and parameterfree online learning algorithms by simplifying analysis, improving regret guarantees, and sometimes even improving runtime. We reduce parameter-free online learning to online exp-concave optimization, we reduce optimization in a Banach space to one-dimensional optimization, and we reduce...

متن کامل

Parameter-Free Convex Learning through Coin Betting

We present a new parameter-free algorithm for online linear optimization over any Hilbert space. It is theoretically optimal, with regret guarantees as good as with the best possible learning rate. The algorithm is simple and easy to implement. The analysis is given via the adversarial coin-betting game, Kelly betting and the Krichevsky-Trofimov estimator. Applications to obtain parameter-free ...

متن کامل

Some tractable instances of interval data minmax regret problems: bounded distance from triviality (short version)

This paper focuses on tractable instances of interval data minmax regret graph problems. More precisely, we provide polynomial and pseudopolynomial algorithms for sets of particular instances of the interval data minmax regret versions of the shortest path, minimum spanning tree and weighted (bipartite) perfect matching problems. These sets are defined using a parameter that measures the distan...

متن کامل

Coin Betting and Parameter-Free Online Learning

In the recent years, a number of parameter-free algorithms have been developed for online linear optimization over Hilbert spaces and for learning with expert advice. These algorithms achieve optimal regret bounds that depend on the unknown competitors, without having to tune the learning rates with oracle choices. We present a new intuitive framework to design parameter-free algorithms based o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014